1 Dataset description

There are two submissions: 10267 & 10270.

  • In each submission, 2390 families with .vcf files are included.
  • For each family, two vcf files are provided,
    • one named “sorted”.
    • the other named “annotated”.

1.1 Submission 10267

  • For files named “sorted”,
    • 852 families without GL/PL information
    • 1538 families with valid GL/PL information
      • 310 Trios
      • 1228 families with >=1 siblings
  • For files named “annotated”,
    • 1096 families without GL/PL information
    • 1294 families with valid GL/PL information
      • 309 Trios
      • 985 families with >=1 siblings

Note that for FID:13562, there is no father information in the .vcf file. Also, all families with valid GL/PL information from files named “annotated” are included from files named in “sorted”.



1.2 Submission 10270

  • For files named “sorted”, there is no GL/PL information.
  • For files names “annotated”,
    • 703 families without valid GL/PL information
      • including 13 families with variants < 2000.
    • 1687 families with valid GL/PL information
      • 292 Trios
      • 1395 families with >=1 siblings


1.3 Combined

Note that combing 10267 & 10270, there are 2206 families with complete vcf files.

  • 415 Trios
  • 1791 families with >=1 siblings


2 Call de novo mutations

Triodenovo was used to call de novo mutations:

  • Only variants with GL/PL information were retained.
  • Families were splitted to Parents-Offspring trios.
  • Filters: --minDP 7 --minDepth 10 and other default options
  • Post filters (referred to Homsy et al. 2015 Science):
    • For offsprings: a minimum 10 total reads, 5 alternate allele reads, and a minimum 20% alternate allele ratio if alternate allele reads ≥10 or, if alternate allele reads is <10, a minimum 28% alternate ratio
    • For parents: a minimum depth of 10 reference reads and alternate allele ratio <3.5%

The scripts are stored in /scratch/90days/uqywan67/auti_proj/SSC/scripts/call_deno.R




3 Annotation

  • ANNOVAR was used to annotate refGene and allele frequencies.
    • hg19refGene, exac03nonpsy, gnomad_exome211 databases were used.
    • Based on annotation, further filtered DNMS:
      • exonic or canonical splice-site variant
      • MAF <= 0.001 in non-psychiatric subsets of ExAC (Header: ExAC_nonpsych_ALL in ANNOVAR), and in control samples of gnoMad databases (Header: controls_AF_popmax in ANNOVAR).
  • Gene-level pLI for PTVs was downloaded from ExAC
  • MPC scores for missense variants were annotated using VEP.

3.1 DNMs summary

After applying filters, a total of 4222 DNMs were found in 1763 families with 2438 offsprings.

  • 3386/4222 (80.2%) DNMs were the same with published SSC DNMs from Krumm et al. 2015 and Iossifov et al. 2014.
  • 274 trio-families (with 455 DNMs) and 1489 quads-families (with 3767 DNMs, including 1876 DNMs in 1120 probands and 1891 DNMs in 1044 siblings).
  • 3617 DNMs in 2081 males and 605 DNMs in 357 females.
  • 2331 DNMs in 1394 probands and 1891 DNMs in 1044 siblings.
  • 2808 DNMs were not presented in ExAC, 2900 DNMs were not presented in gnoMad, 2593 DNMs were not presented in both datasets.


3.2 DNMs in quads-familiy

  • A total of 3767 DNMs were observed in 1489 quads-families
    • 1876 DNMs in 1120 probands and 1891 DNMs in 1044 siblings
    • 3282 DNMs in 1892 males and 485 DNMs in 272 femals.


4 Burden test analysis